592 research outputs found
Multi-scale 3D Convolution Network for Video Based Person Re-Identification
This paper proposes a two-stream convolution network to extract spatial and
temporal cues for video based person Re-Identification (ReID). A temporal
stream in this network is constructed by inserting several Multi-scale 3D (M3D)
convolution layers into a 2D CNN network. The resulting M3D convolution network
introduces a fraction of parameters into the 2D CNN, but gains the ability of
multi-scale temporal feature learning. With this compact architecture, M3D
convolution network is also more efficient and easier to optimize than existing
3D convolution networks. The temporal stream further involves Residual
Attention Layers (RAL) to refine the temporal features. By jointly learning
spatial-temporal attention masks in a residual manner, RAL identifies the
discriminative spatial regions and temporal cues. The other stream in our
network is implemented with a 2D CNN for spatial feature extraction. The
spatial and temporal features from two streams are finally fused for the video
based person ReID. Evaluations on three widely used benchmarks datasets, i.e.,
MARS, PRID2011, and iLIDS-VID demonstrate the substantial advantages of our
method over existing 3D convolution networks and state-of-art methods.Comment: AAAI, 201
Person Transfer GAN to Bridge Domain Gap for Person Re-Identification
Although the performance of person Re-Identification (ReID) has been
significantly boosted, many challenging issues in real scenarios have not been
fully investigated, e.g., the complex scenes and lighting variations, viewpoint
and pose changes, and the large number of identities in a camera network. To
facilitate the research towards conquering those issues, this paper contributes
a new dataset called MSMT17 with many important features, e.g., 1) the raw
videos are taken by an 15-camera network deployed in both indoor and outdoor
scenes, 2) the videos cover a long period of time and present complex lighting
variations, and 3) it contains currently the largest number of annotated
identities, i.e., 4,101 identities and 126,441 bounding boxes. We also observe
that, domain gap commonly exists between datasets, which essentially causes
severe performance drop when training and testing on different datasets. This
results in that available training data cannot be effectively leveraged for new
testing domains. To relieve the expensive costs of annotating new training
samples, we propose a Person Transfer Generative Adversarial Network (PTGAN) to
bridge the domain gap. Comprehensive experiments show that the domain gap could
be substantially narrowed-down by the PTGAN.Comment: 10 pages, 9 figures; accepted in CVPR 201
A Layer Decomposition-Recomposition Framework for Neuron Pruning towards Accurate Lightweight Networks
Neuron pruning is an efficient method to compress the network into a slimmer
one for reducing the computational cost and storage overhead. Most of
state-of-the-art results are obtained in a layer-by-layer optimization mode. It
discards the unimportant input neurons and uses the survived ones to
reconstruct the output neurons approaching to the original ones in a
layer-by-layer manner. However, an unnoticed problem arises that the
information loss is accumulated as layer increases since the survived neurons
still do not encode the entire information as before. A better alternative is
to propagate the entire useful information to reconstruct the pruned layer
instead of directly discarding the less important neurons. To this end, we
propose a novel Layer Decomposition-Recomposition Framework (LDRF) for neuron
pruning, by which each layer's output information is recovered in an embedding
space and then propagated to reconstruct the following pruned layers with
useful information preserved. We mainly conduct our experiments on ILSVRC-12
benchmark with VGG-16 and ResNet-50. What should be emphasized is that our
results before end-to-end fine-tuning are significantly superior owing to the
information-preserving property of our proposed framework.With end-to-end
fine-tuning, we achieve state-of-the-art results of 5.13x and 3x speed-up with
only 0.5% and 0.65% top-5 accuracy drop respectively, which outperform the
existing neuron pruning methods.Comment: accepted by AAAI19 as ora
- …